In 5. References Mutual Information Estimation and the Speech Re
نویسندگان
چکیده
the following optimization of the discriminative functions the extended BW algorithm (12) was applied to re-estimate the mixtures of the phoneme models. All phoneme alternatives within the automatically derived phoneme seg-mentation were used in the calculation of the discriminance measure. According to (6), the mixtures of the correct and of all competing models were updated to minimize the objective function. Best results were obtained in MCE optimization using the update equation (15) with = 1:0, which results in a stable learning and a phoneme recognition rate of 64.8 % on the test set. For = 0:3 objective function and error rate were decreasing monotonously, but recognition performance within the 5 iterations was slightly worse (62.9 %). MMI training, using the extended BW algorithm, was stable for a smoothing parameter = 0:3, in which the objective function and the recognition rate were increasing monotonously. The MMI objective is less robust than the MCE function and therefore requires smaller`step-sizes'. 62.0 % phoneme recognition rate were achieved by the MMI optimization, which is an improvement of 2.5 points compared to the ML baseline system. In these experiments MCE training was more stable than MMI optimization and resulted in higher phoneme recognition results. Furthermore an alternative calculation of the discrimi-nance measure based on the best phoneme sequence hypothesis for the utterance was applied. Now the correct description versus the best hypothesis of the utterance, which was derived by a looped phoneme model without a lexicon or language model, was used to compute the objective function. Since both descriptions diier only in some parts of the sentence, just these diierent segments were actually used in discriminative training. Therefore only small parts of the training data were used in the discriminative parameter re-estimation process. Only MCE optimization was examined for this technique, which resulted in minor improvements of 1.7 points to 61.2 % phoneme recognition rate within the same number of iterations. Using`N-best' alternative hypothesis would improve this training scheme by generating more competing hypotheses for the discriminative training. 4. CONCLUSIONS In this paper a discussion of the MCE and the MMI objective was presented. For the HMM parameter update an extended BW reestimation formula was suggested, which can be used for both discriminative methods. It was applied in speaker independent phoneme recognition experiments and improved the recognition rate about 5.3 points from 59.5 % to 64.8 % for the MCE function. In our experiments MCE …
منابع مشابه
Maximum mutual information speaker adapted training with semi-tied covariance matrices
We present re-estimation formulae for semi-tied covariance (STC) transformation matrices based on a maximum mutual information (MMI) criterion. These re-estimation formulae are different from those that have appeared previously in the literature. Moreover, we present a positive definiteness criterion with which the regularization constant present in all MMI re-estimation formulae can be reliabl...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملInnovative Speech Reconstructive Surgery
Proper speech functioning in human being, depends on the precise coordination and timing balances in a series of complex neuro nuscular movements and actions. Starting from the prime organ of energy source of expelled air from respirato y system; deliver such air to trigger vocal cords; swift changes of this phonatory episode to a comprehensible sound in RESONACE and final coordination of all h...
متن کاملکاربرد الگوریتم جداسازی کور منابع در جداسازی سیگنالهای گفتار و موسیقی
In this paper, the application of the Independent Component Analysis In this paper, the application of the Independent Component Analysis technique in speech-music separation is discussed. The separation algorithm is in the time domain. It needs the score function estimation to minimize the mutual information. For estimating score function, sufficient samples of the mixed (speech-music) signals...
متن کاملOn classification improvement by using an approximate discriminative hidden Markov model Mejoramiento de la clasificación usando un modelo oculto de Markov discriminativo aproximado
HMMs are statistical models used in a very successful and effective form in speech recognition. However, HMM is a general model to describe the dynamic of stochastic processes; therefore it can be applied to a huge variety of biomedical signals. Usually, the HMM parameters are estimated by means of MLE (Maximum Likelihood Estimation) criterion. Nevertheless, MLE has as disadvantage that the dis...
متن کاملPitch estimation using mutual information
A spectrotemporal method based on Mutual Information (MI) is proposed for pitch estimation of voiced speech signals. We use MI as the similarity measure between voiced speech segments and their delayed version. Instead of measuring linear dependencies, MI measures statistical dependency, which suits the dynamic characteristic of speech signals. Besides, higher-order statistics are directly enco...
متن کامل